Processing Text files

1. cat

In [22]:
cat -A a.txt # -vET (All: non-printable, end of lines, tabs)
hi^M$
^M$
this file was ^M$
made in ms windows 7.^M$
^M$
^Ilets see^M$
what is going on.^M$
^M$
  here.
In [29]:
tr -dc '[:print:]\n' < a.txt | cat -A # delete, complement set1
hi$
$
this file was $
made in ms windows 7.$
$
lets see$
what is going on.$
$
  here.
In [32]:
cat -n txt
     1	hi
     2	
     3	it's
     4	
     5	
     6	
     7	a new file
In [17]:
cat -ns txt
     1	hi
     2	
     3	it's
     4	
     5	a new fiel
In [37]:
cat -b txt
     1	hi

     2	it's



     3	a new file
In [44]:
cat -b < txt | sed /^$/d
     1	hi
     2	it's
     3	a new file

2. tac

In [54]:
grep -v '^#' /etc/default/hddtemp | sed '/^$/d' > hddtemp
In [55]:
cat hddtemp
RUN_DAEMON="false"
DISKS_NOPROBE=""
INTERFACE="127.0.0.1"
RUN_SYSLOG="0"
OPTIONS=""
In [56]:
tac hddtemp
OPTIONS=""
RUN_SYSLOG="0"
INTERFACE="127.0.0.1"
DISKS_NOPROBE=""
RUN_DAEMON="false"
In [ ]:
dmesg | tac

3. sort

In [53]:
ls -l /etc/grub.d | head -5
total 76
-rwxr-xr-x 1 root root  9791 Jul 23  2016 00_header
-rwxr-xr-x 1 root root  6258 Mar 15  2016 05_debian_theme
-rwxr-xr-x 1 root root 12512 Mar  2 00:31 10_linux
-rwxr-xr-x 1 root root 11082 Jul 23  2016 20_linux_xen
In [4]:
ls -l /etc/grub.d | head -5 | sort -k5 -t ' ' # key [field,to field] | separator
total 76
-rwxr-xr-x 1 root root 11082 Jul 23  2016 20_linux_xen
-rwxr-xr-x 1 root root 12512 Mar  2 00:31 10_linux
-rwxr-xr-x 1 root root  6258 Mar 15  2016 05_debian_theme
-rwxr-xr-x 1 root root  9791 Jul 23  2016 00_header
In [6]:
ls -l /etc/grub.d | head -5 | sort -n -r -k5,5
-rwxr-xr-x 1 root root 12512 Mar  2 00:31 10_linux
-rwxr-xr-x 1 root root 11082 Jul 23  2016 20_linux_xen
-rwxr-xr-x 1 root root  9791 Jul 23  2016 00_header
-rwxr-xr-x 1 root root  6258 Mar 15  2016 05_debian_theme
total 76
In [170]:
cat txt
coca	1.1	12:20
pepsi	2.0	00:11
coca	11.3	00:33
fanta	3.01	01:21
coca	4.6	19:01
fanta	4.0	15:12
pepsi	2.01	00:01
In [171]:
sort txt
coca	1.1	12:20
coca	11.3	00:33
coca	4.6	19:01
fanta	3.01	01:21
fanta	4.0	15:12
pepsi	2.0	00:11
pepsi	2.01	00:01
In [14]:
sort -k1,1 -k2,2n txt
coca	1.1	12:20
coca	4.6	19:01
coca	11.3	00:33
fanta	3.01	01:21
fanta	4.0	15:12
pepsi	2.0	00:11
pepsi	2.01	00:01
In [24]:
ls -lF /etc/network  | sort -k8.4 #char4 in key8
total 24
-rw-r--r-- 1 root root  309 Aug 12  2016 interfaces
drwxr-xr-x 2 root root 4096 Jan 24  2016 interfaces.d/
drwxr-xr-x 2 root root 4096 Mar  3 15:18 if-down.d/
drwxr-xr-x 2 root root 4096 Mar 21 01:44 if-up.d/
drwxr-xr-x 2 root root 4096 Apr 13 11:54 if-post-down.d/
drwxr-xr-x 2 root root 4096 Apr 13 11:54 if-pre-up.d/

4. uniq

In [38]:
cat txt
grub.d
passwd
grub.d
shadow
passwd
default
In [62]:
sort txt | uniq -i # makes output unique, ignore case
default
grub.d
passwd
shadow
In [40]:
sort txt | uniq -c # count
      1 default
      2 grub.d
      2 passwd
      1 shadow
In [41]:
sort txt | uniq -d # only duplicates
grub.d
passwd
In [43]:
sort txt | uniq -u # only uniqe rows
default
shadow
In [50]:
ln txt h-txt; ln txt h2-txt
In [55]:
find -maxdepth 1 -type f -links +1 -printf "%p %i\n" | tee out
./txt 394077
./h2-txt 394077
./h-txt 394077
In [59]:
sort out | uniq -d
In [61]:
sort out | uniq -f1 -dc # -f skip n first field
      3 ./h2-txt 394077

5. cut

In [72]:
head /etc/group | grep adm
adm:x:4:syslog,milad
In [69]:
cat /etc/group | grep milad | cut -f1 -d: | tr '\n' ' ' | xargs echo
adm cdrom sudo dip plugdev milad lpadmin sambashare wireshark docker
In [70]:
id milad -Gn
milad adm cdrom sudo dip plugdev lpadmin sambashare wireshark docker
In [81]:
head -4 /etc/passwd | cut --complement -f2-4,7 -d:
root:root:/root
daemon:daemon:/usr/sbin
bin:bin:/bin
sys:sys:/dev
In [157]:
date | tee txt | cat -A
Thu May  4 23:53:18 IRDT 2017$
In [158]:
cut -f4 -d' ' < txt
4
In [159]:
tr -s ' ' < txt | cut -f4 -d' ' | cut -c 4- #  tr: -s, --squeeze-repeats
53:18

6. paste

In [27]:
head -5 /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
In [29]:
cut -f1,7 /etc/passwd -d: | head -5
root:/bin/bash
daemon:/usr/sbin/nologin
bin:/usr/sbin/nologin
sys:/usr/sbin/nologin
sync:/bin/sync
In [30]:
cut -f2-6 --complement /etc/passwd -d: | head -5
root:/bin/bash
daemon:/usr/sbin/nologin
bin:/usr/sbin/nologin
sys:/usr/sbin/nologin
sync:/bin/sync
In [32]:
head -5 /etc/passwd | cut -f1 -d: | tee users
root
daemon
bin
sys
sync
In [34]:
head -5 /etc/passwd | cut -f7 -d: | tee shells
/bin/bash
/usr/sbin/nologin
/usr/sbin/nologin
/usr/sbin/nologin
/bin/sync
In [44]:
paste shells users -d':'
/bin/bash:root
/usr/sbin/nologin:daemon
/usr/sbin/nologin:bin
/usr/sbin/nologin:sys
/bin/sync:sync

7. join

In [10]:
cat namev; echo ---; cat namet;
foo	21
bar	22
baz	21
---
foo	red
bar	blue
baz	blue
In [11]:
join namev namet
foo 21 red
bar 22 blue
baz 21 blue
In [14]:
cat namev; echo ---; cat namet;
21 foo
22 bar
21 baz
---
foo red
bar pink
baz yellow
In [16]:
join -1 2 -2 1 namev namet # -j when both are same
foo 21 red
bar 22 pink
baz 21 yellow

8. comm

Compare sorted files FILE1 and FILE2 line by line.
When FILE1 or FILE2 (not both) is -, read standard input.
In [29]:
cat f1; echo ---; cat f2
a
b
c
d
---
a
c
x
z
In [26]:
comm f1 f2
		a
b
		c
d
	x
	z
uniq1 u2 same in both
In [9]:
comm -3 f1 f2 # do not show 3
b
d
	x
	z

9. diff

it's telling you how to change the first file to make it match the second file.

9.1. Default view

  • First file line numbers
  • Latter
    • a: add
    • c: change
    • d: delete
    • ---: separate file 1, 2
  • Second file line numbers
In [12]:
echo - | cat f1 - f2
blue
red
orange
-
blue
pink

yellow
In [13]:
diff f1 f2
2,3c2,4
< red
< orange
---
> pink
> 
> yellow

Line 2 to 3 from file one has been changed in file two. We should replace them with 2 to 4 from file two to make them same.

In [18]:
echo - | cat f1 - f2
blue
red
orange
-
blue
red
yellow
orange
In [19]:
diff f1 f2
2a3
> yellow

After line 2 in file one, add line 3 from file two (which is yellow)

In [21]:
echo - | cat f1 - f2
blue
red
orange
-
blue
orange
In [23]:
diff f1 f2
2d1
< red

9.2. Context

  • \s\s Same
  • ! Change
  • - Remove from first file
  • + Add from second to first one
In [30]:
echo "__-_-_-__" | cat f1 - f2
Ardabil
Yazd
Esfehan
Ahvaz
__-_-_-__
Tehran
Yazd
Ahvaz
Shiraz
In [78]:
diff -c f1 f2
*** f1	2017-05-05 19:35:18.449963816 +0430
--- f2	2017-05-05 17:26:31.142020824 +0430
***************
*** 1,4 ****
! Ardabil
  Yazd
- Esfehan
  Ahvaz
--- 1,4 ----
! Tehran
  Yazd
  Ahvaz
+ Shiraz

Apply the changes to f1:

Ardabil -> Tehran, Remove Esfehan, Add shiraz

In [34]:
diff f1 f2

9.3. Unified

In [79]:
diff -u f1 f2
--- f1	2017-05-05 19:35:18.449963816 +0430
+++ f2	2017-05-05 17:26:31.142020824 +0430
@@ -1,4 +1,4 @@
-Ardabil
+Tehran
 Yazd
-Esfehan
 Ahvaz
+Shiraz

9.4. Column

In [80]:
diff -y f1 f2
Ardabil							      |	Tehran
Yazd								Yazd
Esfehan							      <
Ahvaz								Ahvaz
							      >	Shiraz

In [81]:
diff f1 f2 # check with default output
1c1
< Ardabil
---
> Tehran
3d2
< Esfehan
4a4
> Shiraz

9.5. Create an ed script

In [39]:
diff f1 f2 -e > changes

In [40]:
cat changes
4a
Shiraz
.
3d
1c
Tehran
.
In [41]:
echo 'w' >> changes # hey ed write to file ;)
In [42]:
cat f1 # before running ed
Ardabil
Yazd
Esfehan
Ahvaz
In [43]:
ed < changes f1
27
25
In [44]:
cat f2 # after ed did the job
Tehran
Yazd
Ahvaz
Shiraz
In [45]:
diff f1 f2
In [46]:
md5sum f1 f2
9b2d361c3076ad25849ff2fcb87c2c8a  f1
9b2d361c3076ad25849ff2fcb87c2c8a  f2

10. patch

In [58]:
diff f1 f2
1c1
< Ardabil
---
> Tehran
3d2
< Esfehan
4a4
> Shiraz

In [68]:
diff -u f1 f2 > diff

In [69]:
cat diff
--- f1	2017-05-05 18:43:09.809986897 +0430
+++ f2	2017-05-05 17:26:31.142020824 +0430
@@ -1,4 +1,4 @@
-Ardabil
+Tehran
 Yazd
-Esfehan
 Ahvaz
+Shiraz
In [70]:
patch < diff
patching file f1
In [71]:
cat f1
Tehran
Yazd
Ahvaz
Shiraz

11. tr

In [82]:
echo "Here is some text." | tr -d ' '
Hereissometext.
In [93]:
tr -c -d '[:upper:]\n' < f1
A
Y
E
A
In [95]:
echo "Hi_this_is_a-TEST" | tr '[:upper:]_-' '[:lower:] '
hi this is a test

Lecture notes


License

Creative Commons License

Linux Notes by Milad As (Ravexina) is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


ravexina's gitlab

ravexina's github